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Abstract. Translocation through a nanopore is a new experimental technique to 
probe physical properties of biomolecules. A bulk of theoretical and computational 
work exists on the dependence of the time to translocate a single unstructured molecule 
on the length of the molecule. Here, we study the same problem but for RNA molecules 
for which the breaking of the secondary structure is the main barrier for translocation. 
To this end, we calculate the mean translocation time of single-stranded RNA through 
a nanopore of zero thickness and at zero voltage for many randomly chosen RNA 
sequences. We find the translocation time to depend on the length of the RNA molecule 
with a power law. The exponent changes as a function of temperature and exceeds the 
naively expected exponent of two for purely diffusive transport at all temperatures. 
We interpret the power law scaling in terms of diffusion in a one-dimensional energy 
landscape with a logarithmic barrier. 
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1. Introduction 

Nanopore technology has opened a completely new window for probing the properties 
of polymers in general and biopolymers in particular [H [21 [3] . In a nanopore setup two 
macroscopic chambers filled with a buffer solution are separated from each other by 
a wall. Embedded into this wall is a single nanopore, i.e., a hole with a diameter in 
the few nanometer range, connecting the two chambers. When charged polymers are 
added into one chamber, an electric field applied across the nanopore can drive these 
polymers through the pore one by one. Drops in the induced counter ion current due to 
the occlusion of the pore by the translocating polymer allow the translocation dynamics 
of individual polymers to be observed. In recent years, this technique has been applied 
extensively to study DNA [Il[2l[3llll[5l[6l[7l[8l[9l[T0l[n]and RNA [I2] molecules as 
well as proteins [IHl [H] . 

The emergence of this new experimental technique has also spurred a lot of 
activity on the theoretical side. There has been particular interest in understanding the 
nonequilibrium statistical mechanics associated with the translocation of unstructured, 
linear polymers, e.g., single-stranded DNA in which all nucleotides are the same [151 
[iniinilTHlITniEOlEIlESlES]. The quantities of interest are the (experimentally 
measurable) distribution of translocation times, and the asymptotic behavior of the 
typical translocation time as the polymers become very long. 

On the simplest level of description, the translocation of a linear polymer is hindered 
by an entropic barrier. An entropic barrier emerges since the wall separating the two 
chambers effectively separates the polymer into two sections: the trans section which 
has already translocated and the cis section which yet has to translocate. Each of 
these sections is constrained in its motion by the wall, and the constraint is most severe 
when the polymer has translocated half way through the pore. More quantitatively, 
if a polymer with sequence length N is divided into sections of length m and N — m, 
respectively, the total number of configurations available to this polymer is reduced 
(compared to a free polymer) by the power law factors m"'''" and {N — m)~^" [24]. Here, 
the exponent -^u depends on the asymptotic statistical properties of the polymer that 
are affected only by the spatial dimensionality and a possible self-avoidance interaction 
(7„ = 1/2 for an ideal, noninteracting chain). As a consequence, the entropic barrier 
experienced by the translocating polymer (i.e., the difference in free energy between a 
polymer that just entered the pore (m = 1) and a polymer with m bases on the trans 
side) has the shape 



with 7 = 7u ^ 1/2. The maximum of this barrier at m = A^/2 depends logarithmically 
on A^, with 7^ as a prefactor. Modeling the translocation process as a one-dimensional 
diffusion across this entropic barrier is an appropriate description, if the translocation 
process is adiabatically slow, e.g. due to friction at the pore, such that the polymer ends 
on each side sample many different configurations during the time required to translocate 
a macroscopic portion of the polymer. It has been established that if entropy reduction 
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is the only barrier, translocation is purely diffusive (i.e., the translocation times scale 
as A^^) in the limit of zero voltage and ballistic (i.e., the translocation times scale as 
for long polymers) at finite voltages independent of the characteristics of the polymer 
model (i.e., independent of the precise value of the exponent 7„) [23]. However, there 
is still an ongoing debate what effect the actual polymer dynamics may have on the 
translocation time distributions under conditions where the adiabatic approximation 
breaks down, such that the polymer dynamics is directly coupled to the translocation 
dynamics [iHl EH ESI |26l EH EH [29] . 

Here, we focus on a different, but similarly challenging theoretical question, which 
arises when the translocating polymers are structured heteropolymers. This issue has 
obtained some experimental [5|, [6l [7] and theoretical [71 [30], [31] attention but far less than 
the case of unstructured molecules. In particular, we consider a polynucleotide, an RNA 
or a single-stranded DNA, consisting of a specific sequence of individual nucleotides, 
i.e. A, C, G, and U for the case of RNA. For simplicity, we will loosely use 'RNA' 
to refer to both RNA and single-stranded DNA in this article, as the biochemical 
difference between these polynucleotides is insignificant for the questions we address. 
RNA molecules have a strong propensity to form intramolecular Watson-Crick, i.e., 
G-C and A-U, base pairs. The formation of such base pairs forces the molecules to 
fold into sequence-dependent structures, which are characterized by their basepairing 
pattern. The naturally evolved sequences of structural RNA's, e.g. ribosomal RNA, 
are biased to stably fold into particular, functional structures, whereas the sequences 
of many other RNA's, e.g. most messenger RNA's, primarily encode information, not 
structure. The structural features of this latter class can be modelled via the ensemble of 
random RNA sequences [32l [33l [34] . Here, we characterize the translocation dynamics of 
this class, focusing on the slow translocation limit. We identify nontrivial translocation 
behavior, and study the physical origin of this behavior. 

Even with a random sequence, a single RNA molecule may spend most of the time 
in a dominant basepairing pattern ('glassy behavior' [32]). Or else it may sample a 
promiscuous array of alternative structures with different shapes [35]. The transition 
between these two types of behavior occurs as a function of temperature, with low 
temperatures favoring glassy behavior [HH [36l [37] . It is interesting to ask whether this 
transition is reflected also in the translocation behavior, and if so, how? 

Generally, if a folded molecule is to translocate through a very narrow pore that 
allows only single strands to pass, it has to break its base pairs in the process. This yields 
a coupling between the observed translocation dynamics and the base pairing properties 
of the molecule [31] . In this system, the separation of the polymer into a cis and a trans 
section has an additional effect, namely that bases on each side of the pore can only pair 
with bases on the same side of the pore thus limiting the possible pairing partners. On 
average, this restriction in the base pairing pattern again is believed to lead to a free 
energy barrier that is logarithmic in the length of the polymer (see below and [33]). Thus, 
at least superficially, the problem of a structured RNA molecule translocating through a 
nanopore is mathematically similar to the problem of homopolymer translocation, even 



Anomalous scaling in nanopore translocation of structured heteropolymers 



4 



Qnm 



N 




m(t) bases 



+ 



Figure 1. Sketch of a structured RNA molecule translocating through a narrow pore, 
which allows single but not double strands to pass. Translocation can be driven by an 
applied voltage acting on the negative charges of the RNA backbone. An appropriate 
reaction coordinate for the translocation process is the number of bases m that have 
reached the trans side. If the translocation is sufficiently slow, for instance due to 
molecular friction at the pore or energetic barriers caused by basepairing, m becomes 
the only relevant degree of freedom. In this slow translocation limit, there is sufficient 
time for the base-pairing patterns on the cis and trans sides to reoptimize whenever 
m changes. 



though the physical origin of the logarithmic barrier is completely different in nature. 
However, the problem is deeper than this analogy suggests: while the logarithmic barrier 
is insignificant for the translocation of homopolymers (see above), we will see below 
that for structured heteropolymers the translocation dynamics is drastically affected. 
This is a consequence of the fact that in the structured case, the prefactor 7 of the 
logarithmic barrier is both bigger in magnitude (such that it exceeds a critical threshold) 
and dependent on temperature. 

The rest of this manuscript is organized as follows. In Sec. [2l we lay out our 
model assumptions and the general theoretical framework used here to describe the 
translocation dynamics, review the relevant aspects of the statistical physics of RNA 
folding, and then link the folding and translocation characteristics of random RNA. In 
Sec. |3]we first explore the translocation dynamics of random RNA sequences numerically, 
and identify an anomalous scaling of the typical translocation time with the length of 
the RNA. Then, we provide some theoretical insight into the origin of this anomalous 
scaling in the discussion. Sec. H] summarizes our results and provides an outlook to 
future work. 

2. Materials and methods 

2.1. Translocation dynamics: general framework 

As illustrated in figure [H we consider a polynucleotide translocating from the cis to 
the trans side of a pore in a membrane. The pore is so narrow that only a single 
strand of the polynucleotide can pass through, and hence only unpaired bases can enter 
the pore. If an external electric voltage V is applied across the pore, translocation is 
biased towards the positive terminal, since RNA has a negatively charged backbone. 
The translocation process has a natural "reaction coordinate": the number of bases 
m{t) that have reached the trans side at time t. For simplicity we will consider an ideal 
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pore with a negligible depth, i.e. we assume that the remaining N — m bases are all 
exposed on the cis side and none reside within the pore. In general, the translocation 
process cannot be described solely by a dynamic equation for the coordinate m, since 
the spatial and basepairing degrees of freedom of the polymer are coupled to m{t), 
see [31]. However, under conditions where the translocation process is sufficiently slow, 
the translocation dynamics becomes effectively one-dimensional, as m{t) reduces to a 
stochastic hopping process in an appropriate one- dimensional free energy landscape 
F{m). Such a description is appropriate if the base-pairing patterns on the cis and 
trans sides have sufficient time to reoptimize whenever m changes. Slow translocation 
arises when the molecular friction at the pore is large, the voltage bias V is small, and 
the energetic barriers due to basepairing are significant. Throughout the present paper, 
we focus entirely on this slow translocation limit. 

With the above assumptions, the stochastic translocation process is described by 
a master equation for P{m,t), the probability to find an RNA molecule with a given 
sequence in a state with m bases on the trans side at time t. This master equation takes 
the general form 



with a set of "hopping" rates k+{m) and k^{m) that depend explicitly on the 
translocation coordinate m. Here, k^{m) is the rate to translocate the base with 
index m + 1 from the cis to the trans side, whereas k_{m) is the rate at which base 
m translocates back from the trans to the cis side. The hopping rates also depend 
on the voltage bias V, the temperature T, and the nucleotide sequence of the RNA. 
In other words, at a given voltage bias and temperature, we need to obtain a set of 
2N hopping rates for each RNA sequence, such that ([2]) describes the translocation 
dynamics. We then want to characterize the translocation behavior for the ensemble of 
random sequences. 

If the m-dependence of the hopping rates is dropped, k+{m) = k+ and k_{m) = k_, 
([2]) describes a homogeneous drift-diffusion process and becomes equivalent to the 
Fokker-Planck equation 



in the continuum limit, where m is replaced by a continuous reaction coordinate 
< a; < A^. Here, D and v are the effective diffusion constant and drift velocity, 
respectively. As was shown by Lubensky and Nelson [3H], past translocation experiments 
with unstructured single-stranded polynucleotides are quantitatively consistent with ([3]): 
The experimental distribution of translocation times p(r) is well described by the 
corresponding distribution from ([3]), which is determined by the probability current 
into the absorbing boundary at x = N. 

For structured RNA's, we express the hopping rates of ([2]) more exphcitly in the 




(2) 



+ k^{m+l) P{m+l,t) + 
— [A;_|_(m) + k_{m)] P{m, t) 



dtP{x, t) = D dlP{x, t)-v d^P{x, t) 



(3) 
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Figure 2. Illustration of the voltage-dependence of the translocation rates in (4) 
of the main text. Even in the absence of secondary structure, the translocation of 
a single base is envisaged as a barrier crossing process. The coarse-grained, discrete 
reaction coordinate m (the number of translocated bases) then corresponds to the 
minima of a continuous microscopic free energy landscape. The distance of the minima 
reflects the base-to-base distance b of the RNA. The position of the transition state, at 
a fractional distance rj from the minimum to the left, is an unknown microscopic 
parameter which determines how the biasing effect of the applied voltage is split 
between the forward and reverse translocation rates: When the unbiased landscape 
of (a) is tilted by the applied voltage as shown in (b), the reduction in the free energy 
barrier for forward translocation is proportional to 77, while the increase in the barrier 
for reverse translocation is proportional to (1 — ij). 



Here fco denotes the basic "attempt" rate for the translocation of a single unpaired 
base, while Wdsijn) and Wtransijn) denote the probability that the base attempting 
to translocate is indeed not paired. The exponential (Arrhenius) factors account 
for the voltage bias V across the pore, which acts on the effective charge gefr of a 
nucleotide |9l [39] (note that the applied voltage drops primarily directly across the pore, 
while the nucleotides do not experience a significant electrostatic force on either side). 
The dimensionless factor 77 is a measure for the position of the microscopic transition 
state that limits the rate for the crossing of a single nucleotide. More precisely, 77 is the 
relative distance of this transition state from the entrance of the pore; for a symmetric 
pore, ?7 = 1/2 (see figure [2]). 

For a fixed but arbitrary set of hopping rates k+{m), k^{m), the (thermal) average 
of the translocation time can be calculated analytically using the mean first passage 



form 




(4) 
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time formalism ^D]- One obtains 
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This equation assumes that at time t = 0, the translocation process has already 
proceeded to the translocation coordinate m(0) = tuq. While the entire translocation 
process consists of an entrance stage (with possible failed attempts), followed by a 
passage stage, our focus here is only on the latter. More precisely, we are interested 
in the detailed passage dynamics of the successful translocation events. Equation ([5]) 
assumes reflecting boundary conditions at m = 0, i.e. the molecule is only allowed 
to exit the pore on the trans side, as in previous theoretical studies [18], [231 EHl [26] . 
Experimentally, this corresponds to a situation where, e.g., a protein or a small bead is 
attached to the trans end of the molecule, preventing exit to the cis side. In particular at 
low driving voltages, such a "road block" will be experimentally required, since otherwise 
it would not be possible to separate failed translocation attempts from full translocation 
events to the trans side. At larger driving voltages, the boundary condition at m = 
is expected to be less relevant, since molecules are then unlikely to exit the pore on the 
cis side once they are inserted into the pore. At the other end, m = N, assumes an 
absorbing boundary, i.e., the translocation time r is defined as the time when the state 
m = iV is first reached. 

To determine the hopping rates (llj) for an RNA molecule with a given sequence, we 
first need to calculate the probabilities Wcis{rn) and Wtransin^')- To this end, we review in 
the following section the physics of RNA folding and the characteristics of random RNA 
sequences, before we return to link these characteristics to the translocation dynamics 
in Sec. 12.31 

2.2. Folding of random RNA sequences 

In this section, we will review the aspects of the statistical physics of structures of 
random RNA molecules that are relevant for our study. We will follow the bulk of the 
previous literature and exclusively focus on RNA secondary structures [321 [331 EEl [371 
[4TI [42l [43] . An RNA secondary structure is the collection of all base pairs formed by a 
molecule. Formally, it can be described as a set 5 = ji), . . . , {in,jn)} of all pairs of 
indices {ik,jk) (with < jk) of bases that are paired. A pairing configuration is only 
considered to be a valid secondary structure if it fulfils two conditions: (i) Each base is 
paired with at most one other base, (ii) If {i,j) is a base pair and {k,l) is a base pair 
with i < k, they have to be either nested, i.e., fulfil i<k<l<j,OT independent, 
i.e., fulfil i < j < k < I. Forbidden base pairing configurations with i < k < j < I are 
called pseudo-knots. Restricting the allowable secondary structures to only those that 
contain neither base triplets nor pseudo-knots is an approximation since both structural 
elements do occur in actual structures. However, the approximation is reasonable since 
base triplets and pseudo-knots are believed to be rare in natural structures and can be 
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effectively suppressed by performing experiments in the absence of multi-valent ions, 
sucfi as Mg2+ [H] . 

Tlie energy E[S] of a given structure S depends on tlie sequence bi . . .Bn of tlie 
RNA molecule. For quantitative analyzes such as the prediction of the actual secondary 
structure of an RNA molecule [15], US], very detailed energy models with hundreds of 
parameters have been developed [17]. Since we are interested in more generic questions 
such as the scaling behavior of translocation times, we will use a strongly simplified 
energy model that focuses on the base pairing alone. More specifically, we will assign 
an energy solely derived from the base pairs formed in the structure, i.e., 

where Eij is the energy for the formation of a base pair between base bi and bj. 
Following [311 EZ] we will even ignore the differences between the stability of different 
Watson-Crick base pairs and use the simplest possible model 

{—Em h and bj are a Watson-Crick pair , . 

£mm otherwise 

where the match and mismatch energies Em and Emm are positive constants. Such a 
simplified energy model clearly is not suitable for the quantitative prediction of the 
behavior of an individual RNA molecule. However, the universal properties of the RNA 
folding problem, such as the thermodynamic phases, the topology of the phase diagram, 
and the critical exponents characterizing these phases in the thermodynamic limit are 
expected to be correctly captured. 

For this as well as other more complicated energy models, the partition function of 
an RNA molecule of a given sequence bi . . .bN can be calculated exactly in polynomial 
time [ISj. This can be done by introducing as an auxiliary quantity the partition 
function Zij for the substrand bi . . . bj of the original molecule. The jth base can 
either be unpaired or paired with the kth base, where k takes all of the possibilities 
from i to j — 1. If the jth base is unpaired, the allowable structures are exactly the 
allowable structures for the substrand bi . . . If the jth base is paired with the kth 
base, the exclusion of pseudo-knots implies that in the presence of the {k,j) base pair, 
any structure is possible on the substrand bi . . . bk-i and on the substrand bk+i ■ ■ ■ bj-i 
but base pairs between these two substrands are forbidden. That yields the recursion 
equation 

i-i 

Zij = Zij^i + Zj_fc_ie ^'^'"•^Zfc+ij-i (8) 

k=i 

where f3 = {ksT)'^. Since the substrands referred to on the right hand side of this 
equation are shorter than the substrands referred to on the left hand side, this recursion 
equation can be used to start from the trivial single and two base substrands and 
calculate the partition functions for the increasingly larger substrands. The partition 
function is then the partition function of the whole molecule. Since in this process 
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0{N^) of the Zij have to be calculated with each calculation requiring one summation 
over the index k, the total computational complexity for this calculation is 0{N^). 

Through various numerical and analytical arguments it has been established that 
RNA secondary structures undergo a glass transition between a high temperature molten 
and a low temperature glassy phase [32l Ell EHl ETJ US], HHl [50] . In the (high temperature) 
molten phase the energetic differences between different structures become irrelevant 
and configurational entropy is the main contributor to the free energy of the structural 
ensemble |35] (it is to be noted that our simplified model of RNA secondary structures 
does not show a denaturation transition where base pairing itself becomes unfavorable 
and the molecule becomes completely unstructured. Thus, "high temperature" in terms 
of real RNA molecules refers to temperatures still below the denaturation temperature, 
but close enough so that the energetic differences between different base pairs are 
smeared out). In the glassy phase, one or a few structures (determined by the specific 
sequence of the molecule) become dominant in the thermal ensemble — the molecule 
"freezes" into those structures. 

The molten (high temperature) phase of RNA secondary structures is completely 
understood analytically [35]. Since in the molten phase by definition the base pairing 
energetics do not play a role any more, the behavior of the molten phase can be 
determined by setting all base pairing energies equal, i.e., by choosing = —Eq with 
some positive eo- Under this choice the partition functions Zij no longer depend on the 
nucleotide sequence and thus become translationally invariant, i.e., Zij = Z{j — i + 1). 
The recursion equation ([8]) then simplifies to 

N 

Z{N + 1) = Z{N) + q^Z{k- l)Z{N - k) (9) 

k=l 

where q = exp(/5£:o) is the Boltzmann factor associated with a base pair. This recursion 
equation can be solved in the limit of large N and yields 

Z{N) ^ AN-^^z^ (10) 

where A and zq depend on the Boltzmann factor q. The exponent 7^ = 3/2, however, 
is universal and is characteristic of the molten phase. 

2.3. Translocation of random RNA sequences 

In the context of polymer translocation, it is necessary to determine what effect the 
pore has on the possible secondary structures of the molecule. If direct interactions 
with the pore are ignored, the only effect of the pore is that it divides the molecule 
into two segments, namely the trans part with m bases and the cis part with N — m 
bases. Each part of the molecule can still form RNA secondary structures, but base 
pairs between a base on the trans side and a base on the cis side become impossible. 
This constraint results in a free energy cost. In the entropically dominated molten phase 
a reduction in the number of possibilities for base pairing will decrease the entropy; in 
the energetically dominated glassy phase, a reduction in the number of possibilities to 
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find well matching substrands will increase the energy. In both cases, the free energy 
cost provides a barrier to the translocation process, and we refer to the cost as the pinch 
free energy F{m). The pinch free energy depends explicitly on our reaction coordinate 
m and hence constitutes a free energy landscape for the translocation proces^. 

With the help of the partition function Zij introduced in the previous section, the 
pinch free energy can be easily calculated: The partition function for the RNA molecule 
at position m in the pore has the product form Zi^mZm+i,N (the structures on the cis 
and trans sides are uncorrelated) , whereas the partition function of the unconstrained 
RNA in solution is Zi^j^. The free energy difference between these states is the pinch 
free energy, 

F(m) = -keT [In {Zi^rnZm+i,N) - In ^i,7v] • (11) 

Using the definition of the partition function, we can also establish the explicit link of 
the pinch free energy landscape to the translocation dynamics model of Sec. 12. 1[ To 
this end, we need to determine the probabilities Wdsijn) and Wtransijn) in dl])- Since 
Zi^j represents the total statistical weight of all permitted basepairing patterns for the 
RNA substrand from base i to base j, the probability Wdsifn) for the base immediately 
in front of the pore on the cis side to be unpaired is given by 

Wcis[m) = — . (12) 

Similarly, the probability for the base immediately in front of the pore on the trans side 
to be unpaired is given by 

Zi 1 

Wtrans{m) = . (13) 



Together, ([2]), ([8]), f[T2|) . and f[T3|) fully specify the translocation dynamics of 
structured RNA molecules within our model. The general form ([5]) for the average 
translocation time then simplifies to 



N-l m 

-(m-£)^ Zi^eZe+i^N 



m=mo i=0 



N-l 

kgT 



J2e (14) 

m=mo £=0 

using the free energy F{m) as defined in (fTT]) . It is now evident that the translocation 
dynamics of Sec. 12.11 corresponds to a random walk in the pinch free energy landscape 
which is tilted by the applied voltage. 

Equation ffTTl) can be used to compute the free energy landscape for a specific 
RNA sequence. To characterize the typical translocation behavior of structured RNA 
molecules, we need to generate such landscapes for a large sample from the ensemble of 
random sequences. We will take this numerical approach in section 13. 1[ However, using 

I To keep the notation concise, we suppress the dependence of the pinch free energy F{m) on the total 
sequence length N . 
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(ITU]) , we can analytically determine the typical form -Fjnoiton(^) of the landscape in the 
molten phase, 

Z{m)Z{N -m) 



^molten ("^) = " ksT In 

~ — fc^Tln 



Z{N) 



= 7„A;bT ln[m(A^ - m)/N] . (15) 

This is formally the same logarithmic free energy landscape as for the translocation of 
unstructured polymers, ([1]). However, its physical origin is completely different (namely, 
the structural entropy of base pairing configurations rather than the positional entropy 
of the backbone), and its prefactor 7^ = 3/2 is larger, which will be important below. 

It is interesting to note that the logarithmic behavior of f|T5|) and the value of the 
prefactor can be physically understood by realizing that the ensemble of secondary 
structures in the molten phase corresponds to the ensemble of (rooted) branched 
polymers: The number of possible configurations of a rooted branched polymer of 
molecular weight m is known to scale like m~^l'^ [51j (in addition to the non universal 
extensive factor) and thus the pinch free energy landscape of a translocating RNA 
molecule in the molten phase is the same as the landscape generated by cutting a 
branched polymer of molecular weight iV into two rooted branched polymers of molecular 
weights m and N — respectively [3^ . 

In the glassy phase the situation is much less clear, since there are no analytical 
calculations of the typical pinch free energy for the ensemble of random RNA sequences. 
Furthermore, different numerical studies [Mll^ HSj. which examined the maximal pinch 
(at m = N/2), disagree whether F{N/2) scales logarithmically or as a small power with 
the sequence length N. One numerical argument in favor of a logarithmic dependence is 
that different choices of the sequence disorder yield different prefactors of the logarithm 
or different exponents in the power law. While different prefactors of the logarithm 
are not problematic, exponents that depend on the choice of the disorder contradict 
the notion that exponents should be universal. In [31] the dependence of the maximal 
pinch free energy on sequence length and temperature was studied in detail and it was 
found that the dependence of the maximal pinch free energy on sequence length can be 
described rather well by a logarithmic law for all temperatures. At high temperatures, 
the prefactor a(T) of the logarithmic dependence is f/csT, as expected. However, at 
low temperatures this prefactor ceases to be proportional to temperature and converges 
toward a finite value at zero temperature. Thus, if we assume that the entire averaged 
pinch free energy landscape still has the logarithmic form of ([1]) in the glassy phase, the 
logarithmic dependence of its maximum on sequence length implies that the effective 
prefactor 

o<r) = (16) 

equals 3/2 at high temperatures and diverges as the temperature is lowered below the 



Anomalous scaling in nanopore translocation of structured heteropolymers 



12 




I I I I I I I I I I I I I I I I I I I I L 

0.2 0.4 0.6 0.8 

k^T/8 

B m 



Figure 3. Numerically determined prefactor 7(T) of the logarithmic free energy 
landscape ([T]) as a function of temperature (most of the data from [34]). The statistical 
error of the data is on the order of the symbol size. It can be seen that the prefactor is 
constant | in the high temperature (molten) phase. In the low temperature (glassy) 
phase the prefactor becomes temperature dependent and diverges. The prefactors were 
determined by generating many random RNA sequences with equal probability of the 
four bases of lengths N = 160, 320, 640, and 1280, calculating the restricted partition 
functions for the energy model ^ with e,„ = Smm via ([S]) and extracting the pinch 
free energies at m = N/2 via (fTTI) . The prefactor a{T) of the logarithmic law for the 
pinch free energy was determined by fitting such a logarithmic law to the numerical 
data and the corresponding prefactor of the logarithmic free energy landscape ^{T) 
was calculated from (IT6|) . 



glass transition temperature. For random sequences with equal probability for all four 
bases and energies Em = £mm this behavior is numerically illustrated in figure [31 

3. Results and discussion 
3.1. Numerical analysis 

From the arguments in section [21 one may expect that the translocation of structured 
RNA molecules can be described as a one-dimensional diffusion process in the 
logarithmic energy landscape dTj) with a temperature dependent, potentially large 
prefactor 7. Of course, the scaling of the translocation time with sequence length in 
such a landscape can be derived analytically as a function of the prefactor 7. However, 
there are several uncertainties in this description. First, different numerical studies of 
the pinch free energy of random RNA molecules do not even agree on the question if 
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Figure 4. Numerically determined average free energy landscapes for translocation 
of RNA molecules through a nanopore. For clarity, the numerically determined free 
energies are averaged over ranges of the reaction coordinate m of size 40. The statistical 
errors on the data are smaller than the size of the symbols. It can be seen that the 
average free energy landscape follows the logarithmic shape ([1} with prefactors jksT 
where 7 is taken from figure [3] up to irrelevant additive constants. 



the maximum of the landscape at m = N/2 scales logarithmically or with a small power 
of the sequence length in the glass phase [311 1121 112]. Second, even if the maximum of 
the landscape scales logarithmically, it has not been established that the whole average 
landscape follows the simple shape ([T|). Third, even if the average landscape has the 
suggested shape, the landscape of any given RNA molecule can differ significantly from 
the average landscape. Thus, it is not obvious how the ensemble of translocation times 
of actual landscapes is related to the translocation time over the average landscape. 

To clarify these points, we perform a detailed numerical study of the translocation 
dynamics of random RNA molecules on the basis of the model defined in sections |2. II and 
12.31 We generated free energy landscapes for 2500 different RNA molecules of length 
= 1600 using the partition function recursioii§|, ([8|), and the definition ffTTl) of the pinch 
free energy. The RNA sequences are random and uncorrelated, with equal probabilities 
of 1/4 for each of the four bases. We use the energy model (JTj) with Em = £mm and quote 
all energies in units of e^- 

First of all, our ensemble of free energy landscapes allows us to directly inspect the 
shape of the average landscape. Figure HI shows the pinch free energy landscape F{m) 
averaged over the 2500 realizations of the random sequences (symbols). Superimposed as 

§ During the calculation, the auxiliary partition functions Zij are rescaled to avoid numerical overflows 
due to the large exponential factors at low temperatures. 
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Figure 5. Histogram of thermally averaged translocation times for 2500 random 
sequences of length N ~ 1600 at temperatures kBT/sm — 0.2 and kBT/em = 1. Note, 
that counts in the histogram for kBT/e„i — 1 are rescaled by a factor of 10 to fit into 
the same plot as the histogram for kBT/em — 0.2. It can be seen that the distribution 
already at kBT/sm — 1 spans several decades. At the low temperature kBT/em = 0.2 
the distribution develops a very fat tail consisting of few sequences with very long 
translocation times. 



lines are logarithmic energy landscapes as given by ([T]) with prefactors 7 that are directly 
obtained by multiplying the values shown in figure [3] by fc^T. These energy landscapes 
are shifted by fitted offsets which reflect the behavior of the pinching free energy at very 
small m and which are irrelevant for the scaling behavior of the translocation dynamics. 
The comparison indicates that the overall shape of the average free energy landscapes 
is indeed the logarithmic one, even in the glassy temperature regime of flgure [3] where 
7(r) is signiflcantly larger than |. 

Next, we examine the translocation times. For each sequence, we calculate the 
thermal average of the translocation time (r) using the exact expression f|T^ . The most 
straightforward quantity to extract from the 2500 translocation times thus obtained 
would be the ensemble averaged translocation time (r), where the bar denotes averaging 
over the sequence ensemble. However, as frequently observed in disordered systems, the 
distribution of characteristic times develops a fat tail at low temperatures, which renders 
the ensemble averaged translocation time ill defined (see figure [5]). Instead, we must use 
a definition for the typical value that does not rely on the existence of the mean value. 
For instance, the median or the average of the logarithm both provide a well-defined 
typical value even for fat tailed distributions. Here, we use the average of the logarithm 
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Table 1. Exponents of the power law dependence of the typical translocation time 
on the length of the molecule. These exponents are determined by linear regression of 
the data in figure [S] 



ksT/em exponent 



0.1 


14.05 ±0.08 


0.13 


9.62 ±0.03 


0.17 


6.23 ±0.04 


0.2 


4.79 ±0.06 


0.3 


2.94 ±0.06 


0.6 


2.44 ±0.02 


0.8 


2.43 ±0.01 


1.0 


2.44 ±0.01 



of the translocation time, which can be interpreted as a typical effective energy barrier. 

FigureElshows the resulting ensemble averages of the logarithms of the translocation 
times for different sequence lengths of = 50, 100, 200, 400, 800, and 1600 and for 
temperatures of kBT/Sm = 0.1, 0.13, 0.17, 0.2, 0.3, 0.6, 0.8, and 1.0. Two features of 
figure [6] are immediately obvious. First, at all temperatures, the double logarithmic plot 
of translocation times as a function of sequence length is perfectly linear over the whole 
range of sequence lengths studied. Second, the slope of these lines is independent of 
temperature for large temperatures and increases sharply as the temperature is lowered. 

To quantify the power laws and their temperature dependence, we apply linear 
regression to our logarithmic data. The resulting slopes (i.e., exponents of the power 
law) are shown in tabled! It can be seen that a// exponents are larger than two, i.e. the 
trivial diffusive scaling r ~ A^^ does not describe the translocation dynamics. In the 
high temperature regime, the exponent is clearly independent of temperature, while it 
becomes large and very sensitive to temperature in the low temperature regime. This 
salient feature in the translocation behavior of structured RNA molecules implies highly 
anomalous sub-diffusive dynamics for the translocation process. In the next section, we 
will discuss this behavior from a theoretical perspective. 

3.2. Anomalous scaling of the translocation times 

Given that the translocation dynamics on a free energy landscape of the logarithmic 
form ([1]) was previously studied, and its scaling behavior was found to be normally 
diffusive [23], our finding of anomalous scaling in the present case is surprising. Our 
numerical computation of the average free energy landscape shown in figure H] indeed 
confirmed that the typical landscape for the translocation of structured RNA molecules 
has the logarithmic shape, as the theoretical arguments of Sec. 12.31 had suggested. To 
resolve the apparent contradiction, we now revisit the arguments of reference [23] that 
led to the diffusive scaling. 

Chuang, Kantor, and Kardar considered a continuum description of the 




Figure 6. Dependence of the typical translocation time on sequence length for several 
different temperatures. To avoid problems with fat tails of the distribution of the 
translocation times at low temperatures, the ensemble average of the logarithms of the 
translocation times is taken to determine the typical translocation time. The statistical 
errors on the translocation times are on the order of the size of the symbols. It can be 
seen that the typical translocation time has a very clean power law dependence over 
the whole range of sequence lengths and for all temperatures. The translocation time 
is independent of temperature for large temperatures and becomes very sensitive to 
temperature at low temperatures. 



translocation process, based on a Fokker-Planck equation similar to ([3]), but with the 
drift velocity v replaced by the local gradient of the free energy landscape, 

d 92 D d f d \ 

dt dx^ keT dx\ dx ^ 7 ' 

with F{x) = 'y kBTln[{N — x)x/N]. They note that the polymer length N and the 
diffusion constant can be eliminated from this equation by introducing a rescaled time 
r = tD/N"^ and translocation coordinate s = x/N, 

dp d'^p d / l — 2s \ 

d^^d^^^d's \{l-s)s^) ' ^ ^ 

where p = p{s, r) now is the probability distribution in the rescaled variables. 
Consequently, the authors then argue that the solution of this dimensionless equation 
may be converted back to real time by multiplying the time axis by N'^/D, resulting 
in a diffusive scaling of the translocation time. Indeed, as the authors point out, the 
argument is independent of the value of 7. Application of the argument to the present 
case, with 7 > 3/2, would suggest that the secondary structure of the RNA is irrelevant 
in the slow translocation limit considered here. 



Anomalous scaling in nanopore translocation of structured heteropolymers 
15 



17 




Figure 7. Comparison of the landscape prefactors 7(T) from figure [3] and the 
numerically determined translocation time exponents from table[l]for the temperatures 
kBT/£m where RNA is expected to be in the glassy phase. It can be seen that the 
observed translocation time exponents empirically behave like 2j(T). 



However, we will now see that this conclusion cannot be drawn. The argument 
rests on the tacit assumption that the probability distribution p = p{s, r) develops no 
structure on a microscopic scale. For instance, the continuum Fokker-Planck description 
breaks down, if most of the probability is localized on one or a few points along the 
translocation coordinate m. Indeed, such a localization transition occurs, if 7 exceeds a 
threshold value of one: Assuming a quasi-stationary solution to (fT7|) which is localized 
at the s = border is a self-consistent ansatz, if p behaves as p ~ for small s. For 
7 > 1 the integral of this distribution diverges at the boundary, i.e. the free energy 
barrier to translocation becomes strong enough for the quasi-stationary distribution to 
localize at the boundary. 

In the regime 7 > 1 where the argument for the independence of the translocation 
time on 7 breaks down, the correct scaling behavior of the translocation time can 
be obtained using the standard Kramers rate theory for thermally-induced barrier 
crossing [iQl |52] . In the present case, this approach yields 

F{N/2)/kBT 

T , ~ A^^+^ . (18) 

It is important to note that the barrier height itself according to ([1]) only yields a power 
law of N'^ and that the additional power of results from the prefactor which is often 
ignored in applications of Kramers rate theory. 

If we apply (|T8l) to the molten phase where 7 = |, this yields r ~ A^^-^ in good 
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agreement with our numerical estimates, see table [TJ Furthermore, fll8l) rationalizes 
the sharp increase of the scaling exponent as the temperature is lowered into the glass 
phase, i.e., for kBT/sm < 0.2. However, in that regime the quantitative comparison of 
the translocation time exponents in table [1] and the barrier heights in figure [3] shown in 
figure [7] reveals that the translocation time exponents increase even more dramatically 
than the increase of the landscape prefactors suggests, namely approximately like 2'j{T). 
This indicates that the typical translocation of individual molecules in the glass phase 
of RNA is not well approximated by translocation in the average landscape but rather 
must be dominated by the fluctuations of the free energy landscape around the average. 

4. Conclusion and outlook 

In conclusion, we see that our numerical observation that the scaling of the typical 
translocation time is drastically affected by the secondary structure, is qualitatively well 
in accord with theoretical expectation but quantitatively even exceeds the magnitude 
of the effect expected from the theory. For translocation in a logarithmic landscape it 
is clear from f|T8l) that 7 = 1 constitutes a threshold for a change in the translocation 
behavior: The regime 7 < 1 is marked by an insignificant barrier, diffusive translocation, 
and failure of the Kramers approximation, which assumes a "reaction-limited" process 
and ignores the time required to diffuse from the starting to the end point. In contrast, 
for 7 > 1 the barrier dominates the translocation dynamics and leads to the sub-diffusive 
scaling f|T8l) . Importantly, for unstructured polymers where the logarithmic free energy 
landscape is only due to the configurational entropy of the polymer, we have 7 < 1 even 
if self-avoidance is included. Here, we found that the case of structured RNA molecules 
is always in the opposite regime of 7 > 1. Thus, despite the similarity in the form of the 
free energy landscape, ([I]), the translocation behavior of unstructured and structured 
polynucleotides is quite different. 

The anomalous scaling of translocation times found in our study is only observable 
in the absence of an external voltage. In the presence of an external voltage the gain 
in electrostatic energy due to moving N/2 bases into the pore is linear in N and thus 
for large N always overcomes the logarithmic barrier ([1]) leading to a linear dependence 
of the translocation time on sequence length. However, for finite but small voltages the 
anomalous scaling could still be observed in an intermediate regime of sequence lengths 
where Nq^eV/2 < 7(T). 

Our empirical finding of even stronger anomalous scaling in the glassy phase than 
expected from the average free energy landscape indicates that translocation in the 
glassy phase is strongly affected by the fluctuations and the free energy landscapes 
of the individual RNA molecules. Understanding these fluctuations and the origin of 
the intriguing empirically found 27 (T) law for the translocation time exponent will be 
subject of future research. 
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